Likelihood and Smoothing Spline Analysis of Variance

نویسندگان

  • Grace Wahba
  • Chong Gu
  • Yuedong Wang
چکیده

We discuss a class of methods for the problem of `soft' classi cation in supervised learning. In `hard' classi cation, it is assumed that any two examples with the same attribute vector will always be in the same class, (or have the same outcome), whereas in `soft' classi cation, two examples with the same attribute vector do not necessarily have the same outcome, but the probability of a particular outcome does depend on the attribute vector. In this paper we will describe a family of methods which are well suited for the estimation of this probability. The method we describe will produce, for any value in a (reasonable) region of the attribute space, an estimate of the probability that the next example will be in class 1. Underlying these methods is an assumption that this probability varies in a smooth way (to be de ned) as the predictor variables vary. The method combines results from Penalized log likelihood estimation, Smoothing splines, and Analysis of variance to get the PSA class of methods. In the process of describing PSA we discuss some issues concerning the computation of degrees of freedom for signal, which has wider rami cations for the minimization of generalization error in machine learning. As an illustration we apply the method to the Pima-Indian Diabetes data set in the UCI Repository, and compare the results to Smith et al(1988) who used the ADAP learning algorithm on this same data set to forecast the onset of diabetes mellitus. If the probabilities we obtain are thresholded to make a hard classi cation to compare with the hard classi cation of Smith et al(1988), the results are very similar, however, the intermediate probabilities that we obtain provide useful and interpretable information on how the risk of diabetes varies with some of the risk factors.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A note on bimodality in the log-likelihood function for penalized spline mixed models

For a smoothing spline or general penalized spline model, the smoothing parameter can be estimated using residual maximum likelihood (REML) methods by expressing the spline in the form of a mixed model. The possibility of bimodality in the profile log-likelihood function for the smoothing parameter of these penalized spline mixed models is demonstrated. A canonical transformation into independe...

متن کامل

Model Selection in Linear Mixed Models Using Mdl Criterion with an Application to Spline Smoothing

For spline smoothing one can rewrite the smooth estimation as a linear mixed model (LMM) where the smoothing parameter appears as the variance of spline basis coefficients. Smoothing methods that use basis functions with penalization can utilize maximum likelihood (ML) theory in LMM framework ([8]). We introduce the minimum description length (MDL) model selection criterion in LMM and propose a...

متن کامل

Structured Machine Learning for Soft Classification with Smoothing Spline ANOVA and Stacked Tuning, Testing, and Evaluation

We describe the use of smoothing spline analysis of variance (SSANOVA) in the penalized log likelihood context, for learning (estimating) the probability p of a '1' outcome, given a training set with attribute vectors and outcomes. p is of the form pet) = eJ(t) /(1 + eJ(t)), where, if t is a vector of attributes, f is learned as a sum of smooth functions of one attribute plus a sum of smooth fu...

متن کامل

Smoothing Spline Estimation of Variance Functions

This article considers spline smoothing of variance functions. We focus on selection of smoothing parameters and develop three direct data-driven methods: unbiased risk (UBR), generalized approximate cross validation (GACV) and generalized maximum likelihood (GML). In addition to guaranteed convergence, simulations show that these direct methods perform better than existing indirect UBR, genera...

متن کامل

Use of Two Smoothing Parameters in Penalized Spline Estimator for Bi-variate Predictor Non-parametric Regression Model

Penalized spline criteria involve the function of goodness of fit and penalty, which in the penalty function contains smoothing parameters. It serves to control the smoothness of the curve that works simultaneously with point knots and spline degree. The regression function with two predictors in the non-parametric model will have two different non-parametric regression functions. Therefore, we...

متن کامل

Automatic Generalized Nonparametric Regression via Maximum Likelihood

A relatively recent development in nonparametric regression is the representation of spline-based smoothers as mixed model fits. In particular, generalized nonparametric regression (e.g. smoothingwith a binary response) corresponds to fitting a generalized linear mixedmodel. Automation, or data-driven smoothing parameter selection, can be achieved via (restricted) maximum likelihood estimation ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1993